AITopics | video-based human-object interaction detection

Collaborating Authors

video-based human-object interaction detection

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Video-based Human-Object Interaction Detection from Tubelet Tokens

Neural Information Processing SystemsDec-24-2025, 19:53:40 GMT

We present a novel vision Transformer, named TUTOR, which is able to learn tubelet tokens, served as highly-abstracted spatial-temporal representations, for video-based human-object interaction (V-HOI) detection. The tubelet tokens structurize videos by agglomerating and linking semantically-related patch tokens along spatial and temporal domains, which enjoy two benefits: 1) Compactness: each token is learned by a selective attention mechanism to reduce redundant dependencies from others; 2) Expressiveness: each token is enabled to align with a semantic instance, i.e., an object or a human, thanks to agglomeration and linking. The effectiveness and efficiency of TUTOR are verified by extensive experiments. Results show our method outperforms existing works by large margins, with a relative mAP gain of $16.14\%$ on VidHOI and a 2 points gain on CAD-120 as well as a $4 \times$ speedup.

name change, tubelet token, video-based human-object interaction detection, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.79)

Add feedback

Appendix for Video-based Human-Object Interaction Detection from Tubelet Tokens Danyang T u 1, Wei Sun

Neural Information Processing SystemsAug-17-2025, 01:52:48 GMT

This document contains the appendix for " Video-based Human-Object Interaction Detection from

artificial intelligence, machine learning, video-based human-object interaction detection, (14 more...)

Neural Information Processing Systems

Country: Asia > China > Shanghai > Shanghai (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.30)

Add feedback

Video-based Human-Object Interaction Detection from Tubelet Tokens

Neural Information Processing SystemsJan-17-2025, 21:35:14 GMT

We present a novel vision Transformer, named TUTOR, which is able to learn tubelet tokens, served as highly-abstracted spatial-temporal representations, for video-based human-object interaction (V-HOI) detection. The tubelet tokens structurize videos by agglomerating and linking semantically-related patch tokens along spatial and temporal domains, which enjoy two benefits: 1) Compactness: each token is learned by a selective attention mechanism to reduce redundant dependencies from others; 2) Expressiveness: each token is enabled to align with a semantic instance, i.e., an object or a human, thanks to agglomeration and linking. The effectiveness and efficiency of TUTOR are verified by extensive experiments. Results show our method outperforms existing works by large margins, with a relative mAP gain of 16.14\% on VidHOI and a 2 points gain on CAD-120 as well as a 4 \times speedup.

tubelet token, tutor, video-based human-object interaction detection

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.85)

Add feedback